This is a demo for using the GeRnika R package. This
document contains examples to help any user to understand the usage of
the functionalities offered by the package, which include the simulation
of tumor clonal data and the visualization and comparison of tumor
phylogenies.
The simulation of tumor clonal data consists of simulating a matrix \(F\) that contains the mutation frequency values of a series of mutations in a collection of tumor biopsies or samples. The matrix \(F\) is calculated as the product of a matrix \(B\) that represents the phylogeny of the tumor, and a matrix \(U\), which contains the clone proportions in each particular sample of that tumor.
Tumor data can be simulated through the create_instance
function. The information about its parameters and their usage may be
checked in the following table:
| Parameter | Description | Type |
|---|---|---|
n |
Number of mutations/clones. | Discrete No. |
m |
Number of samples. | Discrete No. |
k |
Topology parameter that controls for the linearity of the topology. | Continuous No. |
selection |
Evolution model followed by the tumor. | “positive”,“neutral” |
noise |
Add sequencing noise to the values in the \(F\) matrix. | Boolean (TRUE by default) |
depth |
Average number of reads that map to the same locus (in noisy cases), also known as sequencing depth. | Discrete No. (30 by default) |
The following is an example of the generation of a noise-free
instance of a tumor that is composed of 5 clones/mutations that has
evolved under neutral evolution and has a k value of 0.5. 5
samples have been taken from it:
I <- create_instance(n = 5, m = 4, k = 0.5, selection = "neutral")This method returns the previously mentioned \(F\), \(B\) and \(U\) matrices and an additional \(F\_true\) matrix, which we will describe later.
Once we have shown an example of the instantiation of a tumor, we will analyze the effect of varying the values of the parameters.
kk is the parameter that controls for the linearity of
the topology of the tumor. As a result, increasing values of
k lead to rather branched phylogenies, while lower values
of k produce trees that tend to linearity.
# Simulate a tumor with k=0:
I1 <- create_instance(n = 5, m = 4, k = 0, selection = "neutral")
# Simulate a tumor with k=2:
I2 <- create_instance(n = 5, m = 4, k = 8, selection = "neutral")
# Create a `Phylotree` class object for each tumor:
tree1 <- B_to_phylotree(B = I1$B)
tree2 <- B_to_phylotree(B = I2$B)
# Plot both trees
plot(tree1)
plot(tree2)
On the left the plot of tree1 (k=0), on the
right the plot of tree2 (k=8).
Following the above, the tree on the left is fully branched as it is composed by a root connected to all the leaves of the tree. On the right side we can see a more linear tree, with just two main branches.
After analyzing the effect of parameter k in the
generation of tumor data, we will proceed to analyze the effect of the
tumor evolution model.
With this parameter we control for the evolution model the tumor follows, either positive selection or neutral evolution. A positive selection-driven evolution model assumes that a few mutations provide cell growth advantage, whereas the remaining mutations do not. Conversely, neutral evolution models assume that, in general, none of the mutations provide significant fitness advantage. As a consequence, tumors under positive selection are dominated by a few clones whereas the rest of clones are present in small proportions. Instead, in tumors with a neutral evolution, all the clones are present in similar proportions.
The clone proportions are well observed in the \(U\) matrix, as this indeed contains the fraction of each clone in each sample of the tumor. Below, we can see this effect with a few examples:
# Simulate a tumor with positive selection:
Ipos <- create_instance(n = 5, m = 8, k = 0.5, selection = "positive")
# Simulate a tumor with neutral evolution:
Ineu <- create_instance(n = 5, m = 8, k = 0.5, selection = "neutral")
# Plot the heatmaps of the U matrices of the instances:
U_to_heatmap(Ipos$U)
U_to_heatmap(Ineu$U)Heatmaps of the \(U\) matrices of an instance of a tumor under positive selection (top) and neutral evolution (bottom).
In that figure, we may see that in the tumor instance that evolves under neutral evolution, the majority of clones are present in all the samples, with only a few exceptions. Most of the clones have fraction values between 0.05 and 0.4. Conversely, the biggest fraction of the tumor instance under positive selection is taken by a few clones, in particular, by clone 3, and by clones 1 and 2 in a lower proportion. In addition, more clones than in the neutral evolution instance are absent; for instance, clone 5 is even missing in all the samples.
Once we have analyzed the difference between the neutral evolution and positive selection-driven evolution models, we will show how can noisy instances be generated and compare them to noise-free instances.
GeRnika has a functionality to add sequencing noise to
the simulated instances. In practice, this is done by adding a few
parameters to the create_instance function. Sequencing
noise is added on top of the noise-free \(F\) matrix, and the \(U\) and \(B\) matrices suffer no changes. The
F slot in the resulting object contains the noisy \(F\) matrix, while the F_true
slot consists of the original noise-free \(F\) matrix. Note that these two slots
contain equal matrices when no sequencing noise is added.
Down below we compare an instance we have added sequencing noise to, with its noise-free counterpart.
# Simulate a tumor with sequencing noise added:
Inoisy <- create_instance(n = 5, m = 8, k = 0.5, selection = "neutral", noisy = TRUE, depth = 5)
# Show the heatmaps of the difference between the F and F_true matrices
F_to_heatmap(abs(Inoisy$F - Inoisy$F_true))The effect of noise.
The heatmap from above shows the difference between the \(F\) matrix and the \(F\_true\) matrix of a tumor instance, i.e. the noise added to the original VAF values of our samples. The amount of noise added is controlled by the depth parameter, which replicates the effect that the sequencing depth has on the noise level. This is explained in more detail in the following subsection.
The sequencing read depth is the average number of reads that map to the same locus (section of the genome). Therefore, the higher the sequencing depth, the most accurate the VAF values and thus, the lower the noise will be.
See the evolution of the produced noise-error for simulations with different depth values below. The first animation below presents the progression of the error for two tumors composed by 10 clones and 2 samples, considering positive selection and neutral evolution. Contrarily, the second animation shows the progression of the error for two tumors (again with positive selection and neutral evolution) with 100 clones and 10 samples.